64-bit scaled sum-of-product operations in a 32-bit environment

ABSTRACT

Logic for performing 64-bit scaled sum-of-product operations in a 32-bit environment accesses a first 32-bit number, a second 32-bit number, and a shift number in a first operation. The logic multiplies the first 32-bit number by the second 32-bit number. The resulting product includes a first 64-bit number that includes a most and least significant 32-bit portions. The logic right-shifts the least significant 32-bit portion of the first 64-bit number according to the shift number, accesses a least significant 32-bit portion of a second 64-bit number, and adds the right-shifted least significant 32-bit portion of the first 64-bit number to the least significant 32-bit portion of the second 64-bit number. The resulting sum includes a least significant 32-bit portion of a final result of a 64-bit scaled sum-of-product operation and a carry bit. In a second operation, the logic multiplies the first 32-bit number by the second 32-bit number. The resulting product includes the first 64-bit number. The logic right-shifts the most significant 32-bit portion of the first 64-bit number according to the shift number and accesses a most significant 32-bit portion of the second 64-bit number. The logic adds the most significant 32-bit portion of the second 64-bit number and the carry bit to the right-shifted most significant 32-bit portion of the first 64-bit number. The resulting sum includes a most significant 32-bit portion of the final result of the 64-bit scaled sum-of-product operation.

TECHNICAL FIELD OF THE INVENTION

[0001] This invention relates generally to processor operations and moreparticularly to 64-bit scaled sum-of-product operations in a 32-bitenvironment.

BACKGROUND OF THE INVENTION

[0002] There are drawbacks associated with traditional 64-bit scaledsum-of-product operations. In traditional 64-bit scaled sum-of-productoperations, scaling operations and sum-of-product operations may need tobe performed separately, which may increase the number of instructionsneeded for such operations (and, therefore, the number of cyclesassociated with executing such operations). Other traditional 64-bitscaled sum-of-product operations may require 64-bit adders and 64-bitregisters (or 32-bit register pairs) or, where such adders areunavailable, multiple microcycles using 32-bit adders. Such operationsmay, however, decrease silicon efficiency, adversely affect processorperformance, or both.

SUMMARY OF THE INVENTION

[0003] Particular embodiments of the present invention may reduce oreliminate disadvantages and problems traditionally associated with64-bit scaled sum-of-product operations in 32-bit environments.

[0004] In one embodiment of the present invention, logic for performing64-bit scaled sum-of-product operations in a 32-bit environment accessesa first 32-bit number, a second 32-bit number, and a shift number in afirst operation. The logic multiplies the first 32-bit number by thesecond 32-bit number. The resulting product includes a first 64-bitnumber that includes a most significant 32-bit portion and a leastsignificant 32-bit portion. The logic right-shifts the least significant32-bit portion of the first 64-bit number according to the shift number.The logic accesses a least significant 32-bit portion of a second 64-bitnumber and adds the right-shifted least significant 32-bit portion ofthe first 64-bit number to the least significant 32-bit portion of thesecond 64-bit number. The resulting sum includes a least significant32-bit portion of a final result of a 64-bit scaled sum-of-productoperation and further includes a carry bit. The logic stores the leastsignificant 32-bit portion of the final result of the 64-bit scaledsum-of-product operation and stores the carry bit. In a secondoperation, the logic accesses the first 32-bit number, the second 32-bitnumber, and the shift number. The logic multiplies the first 32-bitnumber by the second 32-bit number. The resulting product includes thefirst 64-bit number. The logic right-shifts the most significant 32-bitportion of the first 64-bit number according to the shift number. Thelogic access a most significant 32-bit portion of the second 64-bitnumber and accesses the carry bit. The logic adds the most significant32-bit portion of the second 64-bit number and the carry bit to theright-shifted most significant 32-bit portion of the first 64-bitnumber. The resulting sum includes a most significant 32-bit portion ofthe final result of the 64-bit scaled sum-of-product operation. Thelogic stores the most significant 32-bit portion of the final result ofthe 64-bit scaled sum-of-product operation.

[0005] Particular embodiments of the present invention may provide oneor more technical advantages. Particular embodiments may perform 64-bitscaled sum of product operations in a 32-bit environment. Particularembodiments may perform 64-bit scaled sum of product operations using a32-bit adder instead of a 64-bit adder. In particular embodiments,scaling operations may be performed in conjunction with sum-of-productoperations. Particular embodiments may use less circuitry, decrease timerequirements associated with 64-bit scaled sum of product operations,increase silicon efficiency, and improve processor performance. Certainembodiments may provide all, some, or none of these technicaladvantages, and certain embodiments may provide one or more othertechnical advantages which may be readily apparent to those skilled inthe art from the figures, descriptions, and claims included herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] To provide a more complete understanding of the present inventionand the features and advantages thereof, reference is made to thefollowing description, taken in conjunction with the accompanyingdrawings, in which:

[0007]FIG. 1 illustrates an example processor system;

[0008]FIG. 2 illustrates execution of an example IMPYL instruction;

[0009]FIG. 3 illustrates execution of an example ADDUL operation;

[0010]FIG. 4 illustrates execution of an example QMPYL operation; and

[0011]FIG. 5 illustrates execution of an example ADDCL operation.

DESCRIPTION OF EXAMPLE EMBODIMENTS

[0012]FIG. 1 illustrates an example processor system 10, which mayinclude a digital signal processor (DSP). Although a particularprocessor system 10 is described and illustrated, the present inventioncontemplates any suitable processor system 10 including any suitablearchitecture. Processor system 10 may include program memory 12, datamemory 14, and processor 16. Program memory 12 may be used to storeprogram instructions for operations executed by processor 16, and datamemory 14 may be used to store data used in operations executed byprocessor 16. Data (which may include program instructions, data used inoperations executed by processor 16, or any other suitable data) may becommunicated between processor 16 and program memory 12 and betweenprocessor 16 and data memory 14 using data buses 18, which may includeany suitable physical medium for such communication. For example, databuses 18 may include one or more wires coupling processor 16 to programmemory 12 and data memory 14. The number of bits that may becommunicated across a data bus 18 in one clock cycle (which may includea unit of time between two adjacent pulses of a clock signal forprocessor system 10) may be limited. For example, in a 32-bitenvironment, a maximum of thirty-two bits may be communicated acrosseach data bus 18 in one clock cycle. Data addresses (which may specifylocations for data within program memory 12, data memory 14, orelsewhere and may, where appropriate, include the locations themselves)may be communicated between processor 16 and program memory 12 andbetween processor 16 and data memory 14 using address buses 20, whichmay include any suitable physical medium for such communication. Forexample, address buses 20 may include one or more wires couplingprocessor 16 with program memory 12 and data memory 14. Similar to databuses 18, the number of bits that may be communicated across an addressbus 20 in one clock cycle may be limited.

[0013] Processor 16 may execute mathematical, logical, and any othersuitable operations and may, for example only and not by way oflimitation, include one or more shifters 22, arithmetic-logic units(ALUs) 24, multipliers 26, data registers 28, instruction caches 30,program sequencers 32, and data address generators 34. Although aparticular processor 16 is described and illustrated, the presentinvention contemplates any suitable processor 16 including any suitablecomponents. Shifter 22 may be used to left- or right-shift data unitsand perform other suitable tasks. ALU 24 may be used for addition,subtraction, absolute value operations, logical operations (such as, forexample, AND, OR, NAND, NOR, and NOT operations), and other suitabletasks. Multiplier 26 may be used for multiplication and other suitabletasks. In a 32-bit environment, shifter 22, ALU 24, and multiplier 26may each process a maximum of thirty-two bits in one clock cycle. Forexample, ALU 24 may in one clock cycle add numbers that include at mostthirty-two bits. To add numbers that include more than thirty-two bits,the numbers may be divided into parts that each include thirty-two orfewer bits and added in parts. Registers 28 may include a number ofmemory locations for storing intermediate operation results, flags forprogram control, and the like. For example, registers 28 may include oneor more general data registers, temporary registers, condition coderegisters (CCRs), status registers (SRs), address registers, and othersuitable registers. In a 32-bit environment, each register 28 may beused to store a maximum of thirty-two bits. Instruction cache 30 may beused to store one or more program instructions for recurring operations.For example, program instructions for one or more operations that arepart of a loop of operations executed by processor 16 may be storedusing instruction cache 30 such that program memory 12 need not beaccessed each time a program instruction for one or more of theoperations is to be executed. Program sequencer 32 may direct theexecution of operations by processor 16 and perform other suitabletasks. Data address generators 34 may communicate addresses to programmemory 12 and data memory 14 specifying memory locations within programmemory 12 and data memory 14 from which data may be read and to whichdata may be written. Although particular components of processor 16 aredescribed as performing particular tasks, any suitable components ofprocessor 16, alone or in combination, may perform any suitable tasks.In addition, although the components of processor 16 are described andillustrated as separate components, any suitable component of processor16 may be wholly or partly incorporated into one or more othercomponents of processor 16.

[0014] Processor system 10 may be used to perform 64-bit scaledsum-of-product operations. In such operations, one 32-bit number may bemultiplied by another 32-bit number and the resulting 64-bit product maybe right-shifted a particular number of bits and added to a 64-bitnumber. In a 32-bit environment including 32-bit registers 28, 32-bitALUs 24, and 32-bit shifters 22, 64-bit scaled sum-of-product operationsmay be performed in parts. In particular embodiments, such operationsmay be performed in four parts, each of which may include a singleprocessor operation. Two of the parts may together generate the leastsignificant thirty-two bits of the final result, and two of the partsmay together generate the most significant thirty-two bits of the finalresult. In particular embodiments, as described more fully below, 64-bitscaled sum-of-product operations may alternatively be performed in twoparts, each of which may include a single processor operation.

[0015] In the first part of a four-part 64-bit scaled sum-of-productoperation, a first 32-bit number may be multiplied by a second 32-bitnumber, the resulting 64-bit product may be right-shifted a particularnumber of bits, and the least significant thirty-two bits of theright-shifted 64-bit product may be stored. Thus, the first part of afour-part 64-bit scaled sum-of-product operation may be described asfollows:

[0016] P(31:0)=(M(31:0)*X(31:0))>>Scale

[0017] M and X may include 32-bit numbers, Scale may include the numberof bits by which the 64-bit product of the two 32-bit numbers isright-shifted, and P may include the least significant thirty-two bitsof the right-shifted 64-bit product of the two 32-bit numbers.

[0018] The first part may, as an example only and not by way oflimitation, be implemented using an instruction for which there are fouroperands, which instruction may be referred to as Integer Multiply Long(IMPYL) and described as follows:

[0019] IMPYL Reg,SrcA,SrcB,Scale; Reg=(SrcA*SrcB)>>Scale

[0020] SrcA and SrcB may include 32-bit numbers and may be stored inregisters 28, memory locations within data memory 14, or other suitablelocations. Scale may include the number of bits by which the 64-bitproduct of SrcA and SrcB is right-shifted. Scale may be stored in aregister 28 or other suitable location or alternatively include animmediate operand that may be passed to one or more components ofprocessor 16 by the IMPYL instruction. Reg may, after execution of anIMPYL instruction, include the least significant thirty-two bits of theright-shifted 64-bit product of SrcA and SrcB and may be stored in aregister 28 or other suitable location. Herein, reference to aparticular operand may include the operand itself or, where appropriate,the memory location of the operand. Similarly, reference to a particularmemory location may include the memory location itself or, whereappropriate, the operand stored at the memory location. When executed,an IMPYL instruction may multiply SrcA by SrcB, right-shift theresulting product by Scale bits (which shift may include a logicalshift), and store the least significant thirty-two bits of theright-shifted product.

[0021]FIG. 2 illustrates execution of an example IMPYL instruction.Execution of the instruction may begin at step 100, where SrcA and SrcBare accessed. At step 102, SrcA is multiplied by SrcB, resulting in a64-bit product. At step 104, Scale is accessed. As described above,Scale may include an immediate operand and may thus be passed to one ormore components of processor 16 by the IMPYL instruction. At step 106,the 64-bit product from step 102 is right-shifted by Scale bits. At step108, the least significant thirty-two bits of the right-shifted productof SrcA and SrcB is stored in Reg, at which point execution of the IMPYLinstruction may end.

[0022] In the second part of a four-part 64-bit scaled sum-of productoperation, the least significant thirty-two bits of the right-shiftedproduct of the two 32-bit numbers may be added to the least significantthirty-two bits of a 64-bit number, which addition may include anunsigned addition. The resulting sum may include the least significantthirty-two bits of the final result of the 64-bit scaled sum-of-productoperation and, potentially, a carry bit. The least significantthirty-two bits of the final result of the 64-bit scaled sum-of-productoperation and generated carry bit (if a carry was generated) may bestored. Thus, the second part of a four-part 64-bit scaledsum-of-product operation may be described as follows:

[0023] Carry:Y(31:0)=B(31:0)+P(31:0)

[0024] B may include the least significant thirty-two bits of the 64-bitnumber, and P may include the least significant thirty-two bits of theright-shifted 64-bit product of the two 32-bit numbers. Y may includethe least significant thirty-two bits of the final result of the 64-bitscaled sum-of-product operation. Carry may include a carry bit (whichmay include a bit more significant than the most significant bit of theleast significant thirty-two bits of the final result).

[0025] The second part may, as an example only and not by way oflimitation, be implemented using an instruction for which there are twooperands, which instruction may be referred to as Add Unsigned Long(ADDUL) and described as follows:

[0026] ADDUL Reg,SrcC; Reg=Reg+SrcC (unsigned)

[0027] ; Set C if carry generated

[0028] SrcC may include the least significant thirty-two bits of the64-bit number and may be stored in a register 28, a memory locationwithin data memory 14, or another suitable location. Reg may, at theoutset of the execution of an ADDUL instruction, include the leastsignificant thirty-two bits of a right-shifted 64-bit product of the two32-bit numbers from execution of the preceding IMPYL instruction andmay, after the execution of the ADDUL instruction, include the leastsignificant thirty-two bits of the final result of the 64-bit scaledsum-of-product operation. Reg may be stored in a register 28 or othersuitable location. C may include a carry bit and may stored in a statusregister 28.

[0029]FIG. 3 illustrates execution of an example ADDUL instruction.Execution may begin at step 120, where Reg is accessed. As describedabove, Reg may, at the outset of the execution of the ADDUL instruction,include the least significant thirty-two bits of the right-shifted64-bit product of the two 32-bit numbers from execution of the precedingIMPYL instruction. At step 122, SrcC is accessed. At step 124, Reg isadded to SrcC. At step 126, the resulting sum of Reg and SrcC is storedin Reg. At step 128, if the addition of Reg to SrcC generated a carry,execution of the ADDUL instruction proceeds to step 130. At step 130, acarry bit may be set to one, at which point the method may end. Thecarry bit may be stored in a status register 28. At step 128, if theaddition of Reg to SrcC did not generate a carry, execution of the ADDULproceeds to step 132. At step 132, the carry bit may be set to zero, atwhich point execution of the ADDUL instruction may end.

[0030] In the third part of a four-part 64-bit scaled sum-of-productoperation, the first 32-bit number may again be multiplied by the second32-bit number and the resulting product may be right shifted thirty-twobits and stored. Thus, the third part of a four-part 64-bit scaledsum-of-product operation may be described as follows:

[0031] P(31:0)=(M(31:0)*X(31:0))>>32

[0032] M and X may, as described above, include 32-bit numbers, and Pmay include the most significant thirty-two bits of the 64-bit productof the two 32-bit numbers. The second part may, as an example only andnot by way of limitation, be implemented using an instruction for whichthere are three operands, which instruction may be referred to as QMPYLand described as follows:

[0033] QMPYL Reg,SrcA,SrcB; Reg=(SrcA*SrcB)>>32

[0034] SrcA and SrcB, as described above, may include 32-bit numbers andmay be stored in registers 28, memory locations within data memory 14,or other suitable locations. Reg may, after execution of a QMPYLinstruction, include the most significant thirty-two bits of the 64-bitproduct of SrcA and SrcB and may be stored in a register 28 or othersuitable location. When executed, a QMPYL instruction may multiply SrcAby SrcB, right-shift the resulting product by thirty-two bits, and storethe most significant thirty-two bits of the 64-bit product. FIG. 4illustrates execution of an example QMPYL instruction. Execution of theinstruction may begin at step 160, where SrcA and SrcB are accessed. Atstep 162, SrcA is multiplied by SrcB, resulting in a 64-bit product. Atstep 164, the 64-bit product from step 102 is right-shifted bythirty-two bits. At step 166, the most significant thirty-two bits ofthe product of SrcA and SrcB is stored in Reg, at which point executionof the QMPYL instruction may end.

[0035] In the fourth part of a four-part 64-bit scale sum-of-productoperation, the most significant thirty-two bits of the 64-bit product ofthe two 32-bit numbers may be right-shifted by a particular number ofbits. The most significant thirty-two bits of the 64-bit number and acarry bit from execution of the preceding QMPYL instruction (if a carrywas generated) may subsequently be added to the right-shifted mostsignificant thirty-two bits of the 64-bit product of the two 32-bitnumbers, and the resulting sum may be stored. The resulting sum mayinclude the most significant thirty-two bits of the final result of the64-bit scales sum-of-product operation. Thus, the fourth part of afour-part 64-bit scaled sum-of-product operation may be described asfollows:

[0036] Y(63:32)=B(63:32)+(P(31:0)>>Scale)+Carry

[0037] B may include the most significant thirty-two bits of the 64-bitnumber, and P may include the most significant thirty-two bits of the64-bit product of the two 32-bit numbers. Scale may include the numberof bits by which the most significant thirty-two bits of the 64-bitproduct of the two 32-bit numbers is right-shifted. Y may include themost significant thirty-two bits of the final result of the 64-bitscaled sum-of-product operation. Carry may include a carry bit generatedby the addition, in the preceding second part of the 64-bit scaledsum-of-product operation, of the least significant thirty-two bits ofthe 64-bit product of the two 32-bit numbers to the least significantthirty-two bits of the 64-bit number.

[0038] The fourth part may, as an example only and not by way oflimitation, be implemented using an instruction for which there arethree operands, which instruction may be referred to as Add Carry Long(ADDCL) and described as follows:

[0039] ADDCL Reg,SrcD,Scale; Reg=(Reg>>Scale)+SrcD+C

[0040] SrcD may include the most significant thirty-two bits of the64-bit number and may be stored in a register 28, a memory locationwithin data memory 14, or another suitable location. C may include acarry bit from execution of the preceding ADDUL instruction generated bythe addition of the least significant thirty-two bits of the 64-bitproduct of the two 32-bit numbers to the least significant thirty-twobits of the 64-bit number. As described above, C may be stored in astatus register or other suitable location. Scale may include the numberof bits by which Reg is right-shifted and may be stored in a register 28or other suitable location or alternatively include an immediateoperand. Reg may, at the outset of the execution of an ADDCLinstruction, include the most significant thirty-two bits of the 64-bitproduct of the two 32-bit numbers from execution of the preceding QMPYLinstruction and may, after the execution of the ADDCL instruction,include the most significant thirty-two bits of the final result of the64-bit scaled sum-of-product operation. Reg may be stored in a register28 or other suitable location. When executed, a QMPYL instruction mayright-shift Reg a particular number of bits (which shift may include anarithmetic shift), add SrcD and C to Reg, and store the resulting sum.

[0041]FIG. 5 illustrates execution of an example ADDCL instruction.Execution may begin at step 180, where Reg may be accessed. As describedabove, Reg may, at the outset of the execution of the ADDCL instruction,include the most significant thirty-two bits of the 64-bit product ofthe two 32-bit numbers from execution of the preceding IMPYLinstruction. At step 182, Scale may be accessed. As described above,Scale may include an immediate operand and may thus be passed to one ormore components of processor 16 by the ADDCL instruction. At step 184,Reg may be right-shifted by Scale bits, which shift may include anarithmetic shift. At step 186, C may be accessed. As described above, Cmay include a carry bit from execution of the preceding ADDULinstruction. At step 188, SrcD may be accessed. At step 190, C and SrcDmay be added to Reg. At step 192, the resulting sum may be stored inReg, at which point execution of the ADDCL instruction may end.

[0042] As an example only and not by way of limitation, the instructionsIMPYL, ADDUL, QMPYL, and ADDCL may together be used to implement alinear equation of the form Y=B+((M*X)>>Scale) as follows. Y and B mayinclude 64-bit numbers, M and X may include 32-bit numbers, and Scalemay include the number of bits by which the product of M and x isright-shifted.   MOV RegB,#Scale ; Initialize scale value ; Calculatelow part IMPYL RegA,@M,@X,RegB ; RegA = (M * X) >> RegB ADDUL RegA,@Blow ; RegA = RegA + Blow MOVL @Ylow,RegA ; Ylow = RegA ; Calculatehigh part QMPYL RegA,@M,@X ; RegA = (M * X) >> 32 MOVL RegC,@Bhigh ;RegC = Bhigh ADDCL RegC,RegA,RegB ; RegC = RegC + (RegA >> RegB) MOVL@Yhigh,RegC ; Yhigh = RegC

[0043] As an alternative to performing 64-bit scaled sum-of-productoperations in four parts, such operations may in particular embodimentsbe performed in two operations, each of which may include a singleprocessor operation. For example, the first and second parts describedabove may be combined into a single operation and the third and fourthparts described above may be combined into a single operation. Suchoperations may be repeatable and may provide for the efficientimplementation of multiple sum-of-product algorithms. As an example onlyand not by way of limitation, the first and second parts and the thirdand fourth parts may be implemented, respectively, using the followinginstructions: IMACL Reg,SrcA,SrcB,Scale ; First and second parts QMACLReg,SrcA,SrcB,Scale ; Third and fourth parts

[0044] These instructions may in one or more ways resemble Multiply andAccumulate (MAC) instructions typically supported by DSP devices, butmay differ from such instructions in that IMACL and QMACL may incombination carry out the addition of a 64-bit number to a scaledproduct of two 32-bit numbers.

[0045] Particular embodiments of the present invention may provide oneor more technical advantages. Particular embodiments may perform 64-bitscaled sum of product operations using a 32-bit adder instead of a64-bit adder. In particular embodiments, scaling operations may beperformed in conjunction with sum-of-product operations. Particularembodiments may use a minimal amount of circuitry, decrease timerequirements associated with 64-bit scaled sum of product operations,increase silicon efficiency, or improve processor performance. Certainembodiments may provide all, some, or none of these technicaladvantages, and certain embodiments may provide one or more othertechnical advantages.

[0046] Although the present invention has been described with severalembodiments, sundry changes, substitutions, variations, alterations, andmodifications may be suggested to one skilled in the art, and it isintended that the invention may encompass all such changes,substitutions, variations, alterations, and modifications falling withinthe spirit and scope of the appended claims.

What is claimed is:
 1. Logic for performing 64-bit scaled sum-of-productoperations in a 32-bit environment, the logic encoded in media and whenexecuted operable to: in a first operation: access a first 32-bitnumber; access a second 32-bit number; access a shift number; multiplythe first 32-bit number by the second 32-bit number, the resultingproduct comprising a first 64-bit number that comprises a mostsignificant 32-bit portion and a least significant 32-bit portion;right-shift the least significant 32-bit portion of the first 64-bitnumber according to the shift number; access a least significant 32-bitportion of a second 64-bit number; add the right-shifted leastsignificant 32-bit portion of the first 64-bit number to the leastsignificant 32-bit portion of the second 64-bit number, the resultingsum comprising a least significant 32-bit portion of a final result of a64-bit scaled sum-of-product operation, the resulting sum furthercomprising a carry bit; store the least significant 32-bit portion ofthe final result of the 64-bit scaled sum-of-product operation; andstore the carry bit; and in a second operation: access the first 32-bitnumber; access the second 32-bit number; access the shift number;multiply the first 32-bit number by the second 32-bit number, theresulting product comprising the first 64-bit number; right-shift themost significant 32-bit portion of the first 64-bit number according tothe shift number; access a most significant 32-bit portion of the second64-bit number; access the carry bit; add the most significant 32-bitportion of the second 64-bit number and the carry bit to theright-shifted most significant 32-bit portion of the first 64-bitnumber, the resulting sum comprising a most significant 32-bit portionof the final result of the 64-bit scaled sum-of-product operation; andstore the most significant 32-bit portion of the final result of the64-bit scaled sum-of-product operation.
 2. The logic of claim 1, whereinthe carry bit is a zero.
 3. The logic of claim 1, wherein the carry bitis stored in a status register.
 4. The logic of claim 1, wherein: thefirst and second 32-bit numbers are each stored in a register; the shiftnumber is stored in a register; the least and most significant 32-bitportions of the second 64-bit number are each stored in memorylocations; and the least and most significant 32-bit portions of thefinal result of the 64-bit scaled sum-of-product operation are eachstored in memory locations.
 5. The logic of claim 1, wherein the shiftnumber comprises an immediate operand.
 6. The logic of claim 1, whereinthe first and second operations are each implemented using a singleinstruction.
 7. The logic of claim 1, wherein the first and secondoperations each comprise two operations.
 8. The logic of claim 7,wherein each of the operations of the first and second operations areimplemented using a single instruction.
 9. The logic of claim 1, encodedin a digital signal processor (DSP).
 10. A method for performing 64-bitscaled sum-of-product operations in a 32-bit environment, the methodcomprising: in a first operation: accessing a first 32-bit number;accessing a second 32-bit number; accessing a shift number; multiplyingthe first 32-bit number by the second 32-bit number, the resultingproduct comprising a first 64-bit number that comprises a mostsignificant 32-bit portion and a least significant 32-bit portion;right-shifting the least significant 32-bit portion of the first 64-bitnumber according to the shift number; accessing a least significant32-bit portion of a second 64-bit number; adding the right-shifted leastsignificant 32-bit portion of the first 64-bit number to the leastsignificant 32-bit portion of the second 64-bit number, the resultingsum comprising a least significant 32-bit portion of a final result of a64-bit scaled sum-of-product operation, the resulting sum furthercomprising a carry bit; storing the least significant 32-bit portion ofthe final result of the 64-bit scaled sum-of-product operation; andstoring the carry bit; and in a second operation: accessing the first32-bit number; accessing the second 32-bit number; accessing the shiftnumber; multiplying the first 32-bit number by the second 32-bit number,the resulting product comprising the first 64-bit number; right-shiftingthe most significant 32-bit portion of the first 64-bit number accordingto the shift number; accessing a most significant 32-bit portion of thesecond 64-bit number; accessing the carry bit; adding the mostsignificant 32-bit portion of the second 64-bit number and the carry bitto the right-shifted most significant 32-bit portion of the first 64-bitnumber, the resulting sum comprising a most significant 32-bit portionof the final result of the 64-bit scaled sum-of-product operation; andstoring the most significant 32-bit portion of the final result of the64-bit scaled sum-of-product operation.
 11. The method of claim 10,wherein the carry bit is a zero.
 12. The method of claim 10, wherein thecarry bit is stored in a status register.
 13. The method of claim 10,wherein: the first and second 32-bit numbers are each stored in aregister; the shift number is stored in a register; the least and mostsignificant 32-bit portions of the second 64-bit number are each storedin memory locations; and the least and most significant 32-bit portionsof the final result of the 64-bit scaled sum-of-product operation areeach stored in memory locations.
 14. The method of claim 10, wherein theshift number comprises an immediate operand.
 15. The method of claim 10,wherein the first and second operations are each implemented using asingle instruction.
 16. The method of claim 10, wherein the first andsecond operations each comprise two operations.
 17. The method of claim16, wherein each of the operations of the first and second operationsare implemented using a single instruction
 18. The method of claim 1,executed by a digital signal processor (DSP).